Determining the Number of Non-Spurious Arcs in a Learned DAG Model

نویسندگان

  • Jennifer Listgarten
  • David Heckerman
چکیده

In many application areas where graphical models are used and where their structure is learned from data, the end goal is neither prediction nor density estimation. Rather, it is the uncovering of discrete relationships between entities. For example, in computational biology, one may be interested in discovering which proteins within a large set of proteins interact with one another. In these problems, relationships can be represented by arcs in a graphical model. Consequently, given a learned model, we are interested in knowing how many of the arcs are real or non-spurious. In our approach to this problem, we estimate and control the False Discovery Rate (FDR) [1] of a set of arc hypotheses. The FDR is defined as the (expected) proportion of all hypotheses (e.g., arc hypotheses) which we label as true, but which are actually false (i.e., the number of false positives divided by the number of total hypotheses called true). In our evaluations, we concentrate on directed acyclic graphs (DAGs) for discrete variables with known variable orderings, as our problem of interest (concerning a particular problem related to HIV vaccine design) has these properties. We use the term arc hypothesis to denote the event that an arc is present in the underlying distribution of the data. In a typical computation of FDR, we are given a set of hypotheses where each hypothesis, i, is assigned a score, si (traditionally, a test statistic, or the p-value resulting from such a test statistic). These scores are often assumed to be independent and identically distributed, although there has been much work to relax the assumption of independence [2]. The FDR is computed as a function of a threshold, t, on these scores, FDR = FDR(t). For threshold t, all hypotheses with si ≥ t are said to be significant (assuming, without loss of generality, that the higher a score, the more we believe a hypothesis). The FDR at threshold t is then given by FDR(t) = E [

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Determining the Number of Non-Spurious Arcs in a Learned DAG Model: Investigation of a Bayesian and a Frequentist Approach

In many application domains, such as computational biology, the goal of graphical model structure learning is to uncover discrete relationships between entities. For example, in our problem of interest concerning HIV vaccine design, we want to infer which HIV peptides interact with which immune system molecules (HLA molecules). For problems of this nature, we are interested in determining the n...

متن کامل

Stack and Queue Layouts of Directed Acyclic Graphs : Extended

Stack layouts and queue layouts of undirected graphs are used to model problems in fault tolerant computing, in VLSI design, and in managing the ow of data in a parallel processing system. In certain applications, such as managing the ow of data in a parallel processing system, it is more realistic to use layouts of directed acyclic graphs (dags) as a model. A stack layout of a dag consists of ...

متن کامل

t-Pancyclic Arcs in Tournaments

Let $T$ be a non-trivial tournament. An arc is emph{$t$-pancyclic} in $T$, if it is contained in a cycle of length $ell$ for every $tleq ell leq |V(T)|$. Let $p^t(T)$ denote the number of $t$-pancyclic arcs in $T$ and $h^t(T)$ the maximum number of $t$-pancyclic arcs contained in the same Hamiltonian cycle of $T$. Moon ({em J. Combin. Inform. System Sci.}, {bf 19} (1994), 207-214) showed that $...

متن کامل

A Novel Qualitative State Observer

The state estimation of a quantized system (Q.S.) is a challenging problem for designing feedback control and model-based fault diagnosis algorithms. The core of a Q.S. is a continuous variable system whose inputs and outputs are represented by their corresponding quantized values. This paper concerns with state estimation of a Q.S. by a qualitative observer. The presented observer in this pape...

متن کامل

Coverings, matchings and paired domination in fuzzy graphs using strong arcs

The concepts of covering and matching in fuzzy graphs using strong arcs are introduced and obtained the relationship between them analogous to Gallai’s results in graphs. The notion of paired domination in fuzzy graphs using strong arcs is also studied. The strong paired domination number γspr of complete fuzzy graph and complete bipartite fuzzy graph is determined and obtained bounds for the s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008